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GENERAL OBJECTIVES 
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The overall objective of this research is to develop efficient methods for the solution of linear and nonlinear sys- 
tems of equations on parallel and supercomputers, and to apply these methods to the solution of problems in structural 
analysis. Attention has been given so far only to linear equations. 
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Factorization Methods on Flex/32 
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The Choleski method first factors K into LL T , the product of a lower triangular matrix and its transpose. The 
solution of the displacement equations Kx = f is then completed by solving the triangular systems of equations Lz = f, 
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forj=k+l to min(k+p,N) for j=k+l tomin(k+ (IN) 

for i=j to min(k+p,N) cmod(jjk) 
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processor 1 processor! processor p 

Wrapped Interleaved Column Storage 




The first figure shows a basic Choleski factorization code using the so-called kji form (L is computed by columns 
using immediate updates). For simplicity, banded storage with bandwidth 0 is used in this code. To the right of the 
first code is shown the same code in a short-hand version: cdiv(k) forms the first column of L and cmod(j,k) modifies 
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Choleski on the Fi ex/ 
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Jp for problems too large tor single processor mm oo.Tpiteti using 


This table shows running times for the parallel Choleski code for the panel focus problem on the FLEX/32. Tim- 
ings are given for 1, 2, 4, 8 and 16 processors. The corresponding speedups and mflop rates are also given. The 
speed-ups are calculated using a serial code. For comparison, times are also given for the parallel code on a single pro- 
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ORIGINAL PAGE IS 

OF POOR QUALITY 



Parallel Choleski Mast and Panel Results 


This figure plots the speedups for the panel focus problems as well as two different mast problems. The first mast 
problem has a small bandwidth (15) and very poor speedups are obtained. The second mast problem has a bandwidth 
of about 50 and the speed-ups are better. The panel problems have much larger bandwidths, as given in the table, and 
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Conjugate Gradient Iteration 
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Running on Panel Focus Pronin - i 


The second type of method is iterative, the conjugate gradient method with two different preconditioners: SSOR 
polynomial and incomplete Choleski factorization. These methods have been used to solve both the panel and mast 
focus problems on the FLEX/32. The incomplete Choleski codes for the FLEX/32 use the FORCE package developed 
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Preconditioned Conjugate Gradient Code 




: Key parts are Kp K and preconditioning 


A code is given for the preconditioned conjugate gradient method. ( , ) is the inner product of two vectors. The 
first two statements compute the next iterate x k+1 by minimizing the quadratic function x T Kx - 2x T f along the line 
x - ap k . The residual at x k+1 is then computed and there is a test for convergence. This test can be based on 
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Preconditioners 
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Note; Multicoloring is used to parallelize SOR 
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SSQR Preconditioned Conjugate Gradient 
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18 S 


This table gives the number of iterations and the running times on the Flex/32 for the conjugate gradient and 
SSOR preconditioned conjugate gradient codes. The mast and panel focus problems are the same as used for the 
Choleski factorization. The convergence criterion used was (r lc+ 1 ,r lc+1 ) < 1CT 6 which gives about four decimal places 
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Speedup PCS FI ex 
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Number of 


This chart shows the speed-ups of the SSOR polynomial preconditioned conjugate gradient method. These 
speed-ups are a little worse than for the conjugate gradient method but still satisfactory. 
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This chart compares the run times in seconds on the FLEX/32 of the Choleski factorization and preconditioned 
conjugate gradient methods. Times are given for three sizes of the panel focus problem and for 1 and 16 processors 
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Conjugate Gradient for PANEL.648 
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The running times for an iterative method depend critically upon the convergence criterion. This table shows the 
results of varying the parameter e in the convergence test (r k+ 1 ,r k+1 ) < e for the conjugate gradient method. 
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The chart shows what can be achieved by preconditioning. The problem is a three-dimensioned Poisson-type 
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Conjugate Gradient on CRAY-2 
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Conjugate Gradient Convergence: (r k+1 ,r k+1 ) < 10 




201 



Summary and Conclusions 
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